Mining Association Rules from Unstructured Documents

ثبت نشده
چکیده

This paper presents a system for discovering association rules from collections of unstructured documents called EART (Extract Association Rules from Text). The EART system treats texts only not images or figures. EART discovers association rules amongst keywords labeling the collection of textual documents. The main characteristic of EART is that the system integrates XML technology (to transform unstructured documents into structured documents) with Information Retrieval scheme (TF-IDF) and Data Mining technique for association rules extraction. EART depends on word feature to extract association rules. It consists of four phases: structure phase, index phase, text mining phase and visualization phase. Our work depends on the analysis of the keywords in the extracted association rules through the co-occurrence of the keywords in one sentence in the original text and the existing of the keywords in one sentence without co-occurrence. Experiments applied on a collection of scientific documents selected from MEDLINE that are related to the outbreak of H5N1 avian influenza virus. Keywords—Association rules, information retrieval, knowledge discovery in text, text mining.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CSCR001: Literature Survey

My PhD research focuses on Text Mining (TM), one major school in Knowledge Discovery in Data (KDD), and in particular the task of classification/categorization of documents using novel algorithms for the identification of hidden patterns within these documents. Two significant techniques of Data Mining (DM), another well-known major school in KDD, will be utilized to support the research: Assoc...

متن کامل

A Strategy to Compromise Handwritten Documents Processing and Retrieving Using Association Rules Mining

Massive amount of new information being created and the world’s data doubles every 18 months, 80-90% of all data is held in various unstructured formats. Useful information can be derived from this unstructured data. The aim of this research is to present a framework for handling handwritten documents in all its trends. Since the handwritten documents are unstructured data, so the objectives of...

متن کامل

Text Mining: Extraction of Interesting Association Rule with Frequent Itemsets Mining for Korean Language from Unstructured Data

Text mining is a specific method to extract knowledge from structured and unstructured data. This extracted knowledge from text mining process can be used for further usage and discovery. This paper presents the method for extraction information from unstructured text data and the importance of Association Rules Mining, specifically for of Korean language (text) and also, NLP (Natural Language ...

متن کامل

Relevant Characteristics Extraction from Semantically Unstructured Data Phd Thesis Title: " Data Mining for Unstructured Data " Author: Relevant Characteristics Extraction from Semantically Unstructured Data Relevant Characteristics Extraction from Semantically Unstructured Data

1 Introduction Most data collections from real world are in text format. Those data are considered semi structured data because they have a small organized structure. Modeling and implementing on semi structured data from recent data bases grows continually in the last years. More over, information retrieval applications, as indexing methods of text documents, have been adapted in order to work...

متن کامل

Textmining: Generating association rules from textual data

Textmining is an emerging research area, whose goal is to discover additional information from hidden patterns in unstructured large textual collection. Hence, given a collection of text documents, most approaches of text mining perform knowledge-discovery operations on labels associated with each document, which are usually keywords that represent the result of non-trivial keyword-labeling pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012